The search functionality is under construction.

Keyword Search Result

[Keyword] neural network(855hit)

161-180hit(855hit)

  • Estimation of Switching Loss and Voltage Overshoot of Active Gate Driver by Neural Network

    Satomu YASUDA  Yukihisa SUZUKI  Keiji WADA  

     
    BRIEF PAPER

      Pubricized:
    2020/05/01
      Vol:
    E103-C No:11
      Page(s):
    609-612

    An active gate driver IC generates arbitrary switching waveform is proposed to reduce the switching loss, the voltage overshoot, and the electromagnetic interference (EMI) by optimizing the switching pattern. However, it is hard to find optimal switching pattern because the switching pattern has huge possible combinations. In this paper, the method to estimate the switching loss and the voltage overshoot from the switching pattern with neural network (NN) is proposed. The implemented NN model obtains reasonable learning results for data-sets.

  • Co-Design of Binary Processing in Memory ReRAM Array and DNN Model Optimization Algorithm

    Yue GUAN  Takashi OHSAWA  

     
    PAPER-Integrated Electronics

      Pubricized:
    2020/05/13
      Vol:
    E103-C No:11
      Page(s):
    685-692

    In recent years, deep neural network (DNN) has achieved considerable results on many artificial intelligence tasks, e.g. natural language processing. However, the computation complexity of DNN is extremely high. Furthermore, the performance of traditional von Neumann computing architecture has been slowing down due to the memory wall problem. Processing in memory (PIM), which places computation within memory and reduces the data movement, breaks the memory wall. ReRAM PIM is thought to be a available architecture for DNN accelerators. In this work, a novel design of ReRAM neuromorphic system is proposed to process DNN fully in array efficiently. The binary ReRAM array is composed of 2T2R storage cells and current mirror sense amplifiers. A dummy BL reference scheme is proposed for reference voltage generation. A binary DNN (BDNN) model is then constructed and optimized on MNIST dataset. The model reaches a validation accuracy of 96.33% and is deployed to the ReRAM PIM system. Co-design model optimization method between hardware device and software algorithm is proposed with the idea of utilizing hardware variance information as uncertainness in optimization procedure. This method is analyzed to achieve feasible hardware design and generalizable model. Deployed with such co-design model, ReRAM array processes DNN with high robustness against fabrication fluctuation.

  • Construction of an Efficient Divided/Distributed Neural Network Model Using Edge Computing

    Ryuta SHINGAI  Yuria HIRAGA  Hisakazu FUKUOKA  Takamasa MITANI  Takashi NAKADA  Yasuhiko NAKASHIMA  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2020/07/02
      Vol:
    E103-D No:10
      Page(s):
    2072-2082

    Modern deep learning has significantly improved performance and has been used in a wide variety of applications. Since the amount of computation required for the inference process of the neural network is large, it is processed not by the data acquisition location like a surveillance camera but by the server with abundant computing power installed in the data center. Edge computing is getting considerable attention to solve this problem. However, edge computing can provide limited computation resources. Therefore, we assumed a divided/distributed neural network model using both the edge device and the server. By processing part of the convolution layer on edge, the amount of communication becomes smaller than that of the sensor data. In this paper, we have evaluated AlexNet and the other eight models on the distributed environment and estimated FPS values with Wi-Fi, 3G, and 5G communication. To reduce communication costs, we also introduced the compression process before communication. This compression may degrade the object recognition accuracy. As necessary conditions, we set FPS to 30 or faster and object recognition accuracy to 69.7% or higher. This value is determined based on that of an approximation model that binarizes the activation of Neural Network. We constructed performance and energy models to find the optimal configuration that consumes minimum energy while satisfying the necessary conditions. Through the comprehensive evaluation, we found that the optimal configurations of all nine models. For small models, such as AlexNet, processing entire models in the edge was the best. On the other hand, for huge models, such as VGG16, processing entire models in the server was the best. For medium-size models, the distributed models were good candidates. We confirmed that our model found the most energy efficient configuration while satisfying FPS and accuracy requirements, and the distributed models successfully reduced the energy consumption up to 48.6%, and 6.6% on average. We also found that HEVC compression is important before transferring the input data or the feature data between the distributed inference processes.

  • Weight Compression MAC Accelerator for Effective Inference of Deep Learning Open Access

    Asuka MAKI  Daisuke MIYASHITA  Shinichi SASAKI  Kengo NAKATA  Fumihiko TACHIBANA  Tomoya SUZUKI  Jun DEGUCHI  Ryuichi FUJIMOTO  

     
    PAPER-Integrated Electronics

      Pubricized:
    2020/05/15
      Vol:
    E103-C No:10
      Page(s):
    514-523

    Many studies of deep neural networks have reported inference accelerators for improved energy efficiency. We propose methods for further improving energy efficiency while maintaining recognition accuracy, which were developed by the co-design of a filter-by-filter quantization scheme with variable bit precision and a hardware architecture that fully supports it. Filter-wise quantization reduces the average bit precision of weights, so execution times and energy consumption for inference are reduced in proportion to the total number of computations multiplied by the average bit precision of weights. The hardware utilization is also improved by a bit-parallel architecture suitable for granularly quantized bit precision of weights. We implement the proposed architecture on an FPGA and demonstrate that the execution cycles are reduced to 1/5.3 for ResNet-50 on ImageNet in comparison with a conventional method, while maintaining recognition accuracy.

  • Sentence-Embedding and Similarity via Hybrid Bidirectional-LSTM and CNN Utilizing Weighted-Pooling Attention

    Degen HUANG  Anil AHMED  Syed Yasser ARAFAT  Khawaja Iftekhar RASHID  Qasim ABBAS  Fuji REN  

     
    PAPER-Natural Language Processing

      Pubricized:
    2020/08/27
      Vol:
    E103-D No:10
      Page(s):
    2216-2227

    Neural networks have received considerable attention in sentence similarity measuring systems due to their efficiency in dealing with semantic composition. However, existing neural network methods are not sufficiently effective in capturing the most significant semantic information buried in an input. To address this problem, a novel weighted-pooling attention layer is proposed to retain the most remarkable attention vector. It has already been established that long short-term memory and a convolution neural network have a strong ability to accumulate enriched patterns of whole sentence semantic representation. First, a sentence representation is generated by employing a siamese structure based on bidirectional long short-term memory and a convolutional neural network. Subsequently, a weighted-pooling attention layer is applied to obtain an attention vector. Finally, the attention vector pair information is leveraged to calculate the score of sentence similarity. An amalgamation of both, bidirectional long short-term memory and a convolutional neural network has resulted in a model that enhances information extracting and learning capacity. Investigations show that the proposed method outperforms the state-of-the-art approaches to datasets for two tasks, namely semantic relatedness and Microsoft research paraphrase identification. The new model improves the learning capability and also boosts the similarity accuracy as well.

  • Sound Event Detection Utilizing Graph Laplacian Regularization with Event Co-Occurrence

    Keisuke IMOTO  Seisuke KYOCHI  

     
    PAPER-Speech and Hearing

      Pubricized:
    2020/06/08
      Vol:
    E103-D No:9
      Page(s):
    1971-1977

    A limited number of types of sound event occur in an acoustic scene and some sound events tend to co-occur in the scene; for example, the sound events “dishes” and “glass jingling” are likely to co-occur in the acoustic scene “cooking.” In this paper, we propose a method of sound event detection using graph Laplacian regularization with sound event co-occurrence taken into account. In the proposed method, the occurrences of sound events are expressed as a graph whose nodes indicate the frequencies of event occurrence and whose edges indicate the sound event co-occurrences. This graph representation is then utilized for the model training of sound event detection, which is optimized under an objective function with a regularization term considering the graph structure of sound event occurrence and co-occurrence. Evaluation experiments using the TUT Sound Events 2016 and 2017 detasets, and the TUT Acoustic Scenes 2016 dataset show that the proposed method improves the performance of sound event detection by 7.9 percentage points compared with the conventional CNN-BiGRU-based detection method in terms of the segment-based F1 score. In particular, the experimental results indicate that the proposed method enables the detection of co-occurring sound events more accurately than the conventional method.

  • Neural Networks Probability-Based PWL Sigmoid Function Approximation

    Vantruong NGUYEN  Jueping CAI  Linyu WEI  Jie CHU  

     
    LETTER-Biocybernetics, Neurocomputing

      Pubricized:
    2020/06/11
      Vol:
    E103-D No:9
      Page(s):
    2023-2026

    In this letter, a piecewise linear (PWL) sigmoid function approximation based on the statistical distribution probability of the neurons' values in each layer is proposed to improve the network recognition accuracy with only addition circuit. The sigmoid function is first divided into three fixed regions, and then according to the neurons' values distribution probability, the curve in each region is segmented into sub-regions to reduce the approximation error and improve the recognition accuracy. Experiments performed on Xilinx's FPGA-XC7A200T for MNIST and CIFAR-10 datasets show that the proposed method achieves 97.45% recognition accuracy in DNN, 98.42% in CNN on MNIST and 72.22% on CIFAR-10, up to 0.84%, 0.57% and 2.01% higher than other approximation methods with only addition circuit.

  • Joint Adversarial Training of Speech Recognition and Synthesis Models for Many-to-One Voice Conversion Using Phonetic Posteriorgrams

    Yuki SAITO  Kei AKUZAWA  Kentaro TACHIBANA  

     
    PAPER-Speech and Hearing

      Pubricized:
    2020/06/12
      Vol:
    E103-D No:9
      Page(s):
    1978-1987

    This paper presents a method for many-to-one voice conversion using phonetic posteriorgrams (PPGs) based on an adversarial training of deep neural networks (DNNs). A conventional method for many-to-one VC can learn a mapping function from input acoustic features to target acoustic features through separately trained DNN-based speech recognition and synthesis models. However, 1) the differences among speakers observed in PPGs and 2) an over-smoothing effect of generated acoustic features degrade the converted speech quality. Our method performs a domain-adversarial training of the recognition model for reducing the PPG differences. In addition, it incorporates a generative adversarial network into the training of the synthesis model for alleviating the over-smoothing effect. Unlike the conventional method, ours jointly trains the recognition and synthesis models so that they are optimized for many-to-one VC. Experimental evaluation demonstrates that the proposed method significantly improves the converted speech quality compared with conventional VC methods.

  • Deep Learning Approaches for Pathological Voice Detection Using Heterogeneous Parameters

    JiYeoun LEE  Hee-Jin CHOI  

     
    LETTER-Speech and Hearing

      Pubricized:
    2020/05/14
      Vol:
    E103-D No:8
      Page(s):
    1920-1923

    We propose a deep learning-based model for classifying pathological voices using a convolutional neural network and a feedforward neural network. The model uses combinations of heterogeneous parameters, including mel-frequency cepstral coefficients, linear predictive cepstral coefficients and higher-order statistics. We validate the accuracy of this model using the Massachusetts Eye and Ear Infirmary (MEEI) voice disorder database and the Saarbruecken Voice Database (SVD). Our model achieved an accuracy of 99.3% for MEEI and 75.18% for SVD. This model achieved an accuracy that is 7.18% higher than that of competitive models in previous studies.

  • Silent Speech Interface Using Ultrasonic Doppler Sonar

    Ki-Seung LEE  

     
    PAPER-Speech and Hearing

      Pubricized:
    2020/05/20
      Vol:
    E103-D No:8
      Page(s):
    1875-1887

    Some non-acoustic modalities have the ability to reveal certain speech attributes that can be used for synthesizing speech signals without acoustic signals. This study validated the use of ultrasonic Doppler frequency shifts caused by facial movements to implement a silent speech interface system. A 40kHz ultrasonic beam is incident to a speaker's mouth region. The features derived from the demodulated received signals were used to estimate the speech parameters. A nonlinear regression approach was employed in this estimation where the relationship between ultrasonic features and corresponding speech is represented by deep neural networks (DNN). In this study, we investigated the discrepancies between the ultrasonic signals of audible and silent speech to validate the possibility for totally silent communication. Since reference speech signals are not available in silently mouthed ultrasonic signals, a nearest-neighbor search and alignment method was proposed, wherein alignment was achieved by determining the optimal pair of ultrasonic and audible features in the sense of a minimum mean square error criterion. The experimental results showed that the performance of the ultrasonic Doppler-based method was superior to that of EMG-based speech estimation, and was comparable to an image-based method.

  • Improvement of Luminance Isotropy for Convolutional Neural Networks-Based Image Super-Resolution

    Kazuya URAZOE  Nobutaka KUROKI  Yu KATO  Shinya OHTANI  Tetsuya HIROSE  Masahiro NUMA  

     
    LETTER-Image

      Vol:
    E103-A No:7
      Page(s):
    955-958

    Convolutional neural network (CNN)-based image super-resolutions are widely used as a high-quality image-enhancement technique. However, in general, they show little to no luminance isotropy. Thus, we propose two methods, “Luminance Inversion Training (LIT)” and “Luminance Inversion Averaging (LIA),” to improve the luminance isotropy of CNN-based image super-resolutions. Experimental results of 2× image magnification show that the average peak signal-to-noise ratio (PSNR) using Luminance Inversion Averaging is about 0.15-0.20dB higher than that for the conventional super-resolution.

  • Heatmapping of Group People Involved in the Group Activity

    Kohei SENDO  Norimichi UKITA  

     
    PAPER

      Pubricized:
    2020/03/18
      Vol:
    E103-D No:6
      Page(s):
    1209-1216

    This paper proposes a method for heatmapping people who are involved in a group activity. Such people grouping is useful for understanding group activities. In prior work, people grouping is performed based on simple inflexible rules and schemes (e.g., based on proximity among people and with models representing only a constant number of people). In addition, several previous grouping methods require the results of action recognition for individual people, which may include erroneous results. On the other hand, our proposed heatmapping method can group any number of people who dynamically change their deployment. Our method can work independently of individual action recognition. A deep network for our proposed method consists of two input streams (i.e., RGB and human bounding-box images). This network outputs a heatmap representing pixelwise confidence values of the people grouping. Extensive exploration of appropriate parameters was conducted in order to optimize the input bounding-box images. As a result, we demonstrate the effectiveness of the proposed method for heatmapping people involved in group activities.

  • Loss-Driven Channel Pruning of Convolutional Neural Networks

    Xin LONG  Xiangrong ZENG  Chen CHEN  Huaxin XIAO  Maojun ZHANG  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2020/02/17
      Vol:
    E103-D No:5
      Page(s):
    1190-1194

    The increase in computation cost and storage of convolutional neural networks (CNNs) severely hinders their applications on limited-resources devices in recent years. As a result, there is impending necessity to accelerate the networks by certain methods. In this paper, we propose a loss-driven method to prune redundant channels of CNNs. It identifies unimportant channels by using Taylor expansion technique regarding to scaling and shifting factors, and prunes those channels by fixed percentile threshold. By doing so, we obtain a compact network with less parameters and FLOPs consumption. In experimental section, we evaluate the proposed method in CIFAR datasets with several popular networks, including VGG-19, DenseNet-40 and ResNet-164, and experimental results demonstrate the proposed method is able to prune over 70% channels and parameters with no performance loss. Moreover, iterative pruning could be used to obtain more compact network.

  • Vehicle Key Information Detection Algorithm Based on Improved SSD

    Ende WANG  Yong LI  Yuebin WANG  Peng WANG  Jinlei JIAO  Xiaosheng YU  

     
    PAPER-Intelligent Transport System

      Vol:
    E103-A No:5
      Page(s):
    769-779

    With the rapid development of technology and economy, the number of cars is increasing rapidly, which brings a series of traffic problems. To solve these traffic problems, the development of intelligent transportation systems are accelerated in many cities. While vehicles and their detailed information detection are great significance to the development of urban intelligent transportation system, the traditional vehicle detection algorithm is not satisfactory in the case of complex environment and high real-time requirement. The vehicle detection algorithm based on motion information is unable to detect the stationary vehicles in video. At present, the application of deep learning method in the task of target detection effectively improves the existing problems in traditional algorithms. However, there are few dataset for vehicles detailed information, i.e. driver, car inspection sign, copilot, plate and vehicle object, which are key information for intelligent transportation. This paper constructs a deep learning dataset containing 10,000 representative images about vehicles and their key information detection. Then, the SSD (Single Shot MultiBox Detector) target detection algorithm is improved and the improved algorithm is applied to the video surveillance system. The detection accuracy of small targets is improved by adding deconvolution modules to the detection network. The experimental results show that the proposed method can detect the vehicle, driver, car inspection sign, copilot and plate, which are vehicle key information, at the same time, and the improved algorithm in this paper has achieved better results in the accuracy and real-time performance of video surveillance than the SSD algorithm.

  • Patient-Specific ECG Classification with Integrated Long Short-Term Memory and Convolutional Neural Networks

    Jiaquan WU  Feiteng LI  Zhijian CHEN  Xiaoyan XIANG  Yu PU  

     
    PAPER-Biological Engineering

      Pubricized:
    2020/02/13
      Vol:
    E103-D No:5
      Page(s):
    1153-1163

    This paper presents an automated patient-specific ECG classification algorithm, which integrates long short-term memory (LSTM) and convolutional neural networks (CNN). While LSTM extracts the temporal features, such as the heart rate variance (HRV) and beat-to-beat correlation from sequential heartbeats, CNN captures detailed morphological characteristics of the current heartbeat. To further improve the classification performance, adaptive segmentation and re-sampling are applied to align the heartbeats of different patients with various heart rates. In addition, a novel clustering method is proposed to identify the most representative patterns from the common training data. Evaluated on the MIT-BIH arrhythmia database, our algorithm shows the superior accuracy for both ventricular ectopic beats (VEB) and supraventricular ectopic beats (SVEB) recognition. In particular, the sensitivity and positive predictive rate for SVEB increase by more than 8.2% and 8.8%, respectively, compared with the prior works. Since our patient-specific classification does not require manual feature extraction, it is potentially applicable to embedded devices for automatic and accurate arrhythmia monitoring.

  • A Highly Configurable 7.62GOP/s Hardware Implementation for LSTM

    Yibo FAN  Leilei HUANG  Kewei CHEN  Xiaoyang ZENG  

     
    PAPER-Integrated Electronics

      Pubricized:
    2019/11/27
      Vol:
    E103-C No:5
      Page(s):
    263-273

    The neural network has been one of the most useful techniques in the area of speech recognition, language translation and image analysis in recent years. Long Short-Term Memory (LSTM), a popular type of recurrent neural networks (RNNs), has been widely implemented on CPUs and GPUs. However, those software implementations offer a poor parallelism while the existing hardware implementations lack in configurability. In order to make up for this gap, a highly configurable 7.62 GOP/s hardware implementation for LSTM is proposed in this paper. To achieve the goal, the work flow is carefully arranged to make the design compact and high-throughput; the structure is carefully organized to make the design configurable; the data buffering and compression strategy is carefully chosen to lower the bandwidth without increasing the complexity of structure; the data type, logistic sigmoid (σ) function and hyperbolic tangent (tanh) function is carefully optimized to balance the hardware cost and accuracy. This work achieves a performance of 7.62 GOP/s @ 238 MHz on XCZU6EG FPGA, which takes only 3K look-up table (LUT). Compared with the implementation on Intel Xeon E5-2620 CPU @ 2.10GHz, this work achieves about 90× speedup for small networks and 25× speed-up for large ones. The consumption of resources is also much less than that of the state-of-the-art works.

  • Gradient-Enhanced Softmax for Face Recognition

    Linjun SUN  Weijun LI  Xin NING  Liping ZHANG  Xiaoli DONG  Wei HE  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2020/02/07
      Vol:
    E103-D No:5
      Page(s):
    1185-1189

    This letter proposes a gradient-enhanced softmax supervisor for face recognition (FR) based on a deep convolutional neural network (DCNN). The proposed supervisor conducts the constant-normalized cosine to obtain the score for each class using a combination of the intra-class score and the soft maximum of the inter-class scores as the objective function. This mitigates the vanishing gradient problem in the conventional softmax classifier. The experiments on the public Labeled Faces in the Wild (LFW) database denote that the proposed supervisor achieves better results when compared with those achieved using the current state-of-the-art softmax-based approaches for FR.

  • A Deep Neural Network-Based Approach to Finding Similar Code Segments

    Dong Kwan KIM  

     
    LETTER-Software Engineering

      Pubricized:
    2020/01/17
      Vol:
    E103-D No:4
      Page(s):
    874-878

    This paper presents a Siamese architecture model with two identical Convolutional Neural Networks (CNNs) to identify code clones; two code fragments are represented as Abstract Syntax Trees (ASTs), CNN-based subnetworks extract feature vectors from the ASTs of pairwise code fragments, and the output layer produces how similar or dissimilar they are. Experimental results demonstrate that CNN-based feature extraction is effective in detecting code clones at source code or bytecode levels.

  • Robust CAPTCHA Image Generation Enhanced with Adversarial Example Methods

    Hyun KWON  Hyunsoo YOON  Ki-Woong PARK  

     
    LETTER-Information Network

      Pubricized:
    2020/01/15
      Vol:
    E103-D No:4
      Page(s):
    879-882

    Malicious attackers on the Internet use automated attack programs to disrupt the use of services via mass spamming, unnecessary bulletin boarding, and account creation. Completely automated public turing test to tell computers and humans apart (CAPTCHA) is used as a security solution to prevent such automated attacks. CAPTCHA is a system that determines whether the user is a machine or a person by providing distorted letters, voices, and images that only humans can understand. However, new attack techniques such as optical character recognition (OCR) and deep neural networks (DNN) have been used to bypass CAPTCHA. In this paper, we propose a method to generate CAPTCHA images by using the fast-gradient sign method (FGSM), iterative FGSM (I-FGSM), and the DeepFool method. We used the CAPTCHA image provided by python as the dataset and Tensorflow as the machine learning library. The experimental results show that the CAPTCHA image generated via FGSM, I-FGSM, and DeepFool methods exhibits a 0% recognition rate with ε=0.15 for FGSM, a 0% recognition rate with α=0.1 with 50 iterations for I-FGSM, and a 45% recognition rate with 150 iterations for the DeepFool method.

  • Software Development Effort Estimation from Unstructured Software Project Description by Sequence Models

    Tachanun KANGWANTRAKOOL  Kobkrit VIRIYAYUDHAKORN  Thanaruk THEERAMUNKONG  

     
    PAPER

      Pubricized:
    2020/01/14
      Vol:
    E103-D No:4
      Page(s):
    739-747

    Most existing methods of effort estimations in software development are manual, labor-intensive and subjective, resulting in overestimation with bidding fail, and underestimation with money loss. This paper investigates effectiveness of sequence models on estimating development effort, in the form of man-months, from software project data. Four architectures; (1) Average word-vector with Multi-layer Perceptron (MLP), (2) Average word-vector with Support Vector Regression (SVR), (3) Gated Recurrent Unit (GRU) sequence model, and (4) Long short-term memory (LSTM) sequence model are compared in terms of man-months difference. The approach is evaluated using two datasets; ISEM (1,573 English software project descriptions) and ISBSG (9,100 software projects data), where the former is a raw text and the latter is a structured data table explained the characteristic of a software project. The LSTM sequence model achieves the lowest and the second lowest mean absolute errors, which are 0.705 and 14.077 man-months for ISEM and ISBSG datasets respectively. The MLP model achieves the lowest mean absolute errors which is 14.069 for ISBSG datasets.

161-180hit(855hit)